cross-domain shape similarity learning
Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. The common routine is to map 2D images and 3D shapes into an embedding space and define (or learn) a shape similarity measure. While metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning, the performance is often unsatisfactory for fine-grained shape retrieval. In the paper, we identify the source of the poor performance and propose a practical solution to this problem. We find that the shape difference between a negative pair is entangled with the texture gap, making metric learning ineffective in pushing away negative pairs. To tackle this issue, we develop a geometry-focused multi-view metric learning framework empowered by texture synthesis. The synthesis of textures for 3D shape models creates hard triplets, which suppress the adverse effects of rich texture in 2D images, thereby push the network to focus more on discovering geometric characteristics. Our approach shows state-of-the-art performance on a recently released large-scale 3D-FUTURE [1] repository, as well as three widely studied benchmarks, including Pix3D [2], Stanford Cars [3], and Comp Cars [4].
Review for NeurIPS paper: Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
Additional Feedback: ****** post rebutall ****** I do apologize for weighing the relevance issue too much. My major concern is that this paper would only meet the novelty bar if this problem is very relevant, as the major contributions are tightly bounded with the IBSR task. I also defend my comments on the use terms. 'Saliency' in the R.fig1 is a bad example for justifying the word saliency. It is ambiguous in a way that it could be the sofa in the back or the chair in the front.
Review for NeurIPS paper: Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
This is a borderline case since it applies the principles of "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" to 3D shape retrieval using conditional GANs in a way that is mostly straightforward. However, I would vote for the acceptance of this paper, given that the majority of the reviewers support the paper since "it was not done before", " the presented approach would serve as a strong baseline against most future approaches addressing the IBSR task", and "thoroughness of the experiments".
Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. The common routine is to map 2D images and 3D shapes into an embedding space and define (or learn) a shape similarity measure. While metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning, the performance is often unsatisfactory for fine-grained shape retrieval. In the paper, we identify the source of the poor performance and propose a practical solution to this problem. We find that the shape difference between a negative pair is entangled with the texture gap, making metric learning ineffective in pushing away negative pairs.